python-docx 识别表格在docx文档中的所在位置

最新推荐文章于 2024-08-19 03:25:28 发布

置顶

Jeff Pan96

最新推荐文章于 2024-08-19 03:25:28 发布

阅读量7.4k

点赞数 11

分类专栏： python 文章标签： python

本文链接：https://blog.csdn.net/panjielove/article/details/104914892

版权

本文介绍如何在Python中利用python-docx库查找并提取Word文档中的表格及其对应章节信息，适用于docx版本0.8.6和Python3.x环境。

摘要由CSDN通过智能技术生成

由于工作需要提取一个word文档中的表格，及其所在的章节，普通的Document.paragraphs 和Document.tables无法满足需求。所以综合GitHub作者的代码及我自己的需求代码如下：

from docx.document import Document
from docx.oxml.table import CT_Tbl
from docx.oxml.text.paragraph import CT_P
from docx.table import _Cell, Table
from docx.text.paragraph import Paragraph
import docx
import openpyxl
import xlsxwriter

def iter_block_items(parent):
    """
    Yield each paragraph and table child within *parent*, in document order.
    Each returned value is an instance of either Table or Paragraph. *parent*
    would most commonly be a reference to a main Document object, but
    also works for a _Cell object, which itself can contain paragraphs and tables.
    """
    if isinstance(parent, Document):
        parent_elm = parent.element.body
    elif isinstance(parent, _Cell):
        parent_elm = parent._tc
    else:
        raise ValueError("something's not righ